[00:00.640 --> 00:06.560]  Alright, welcome everyone. My name is Jack Baker, and I'm going to be talking about bugs in Game
[00:06.560 --> 00:11.760]  Engine. So as a little bit of background, over the last few months, I've found more than 10
[00:11.760 --> 00:15.940]  remotely exploitable bugs while looking at two different game engines. And I'm going to be
[00:15.940 --> 00:22.420]  talking about four of those bugs today. So just to level set things, when I use the term Game
[00:22.420 --> 00:29.500]  Engine, I'm referring to the base software that most games are built on top of. So if you're
[00:29.500 --> 00:34.260]  building a video game, you're probably not doing it from scratch. You're using a pre-made set of
[00:34.260 --> 00:40.900]  tools and software that are built to make that process easier. And that software is called your
[00:40.900 --> 00:47.320]  Game Engine. And so the popularity of many Game Engines means that a lot of games share the exact
[00:47.320 --> 00:53.160]  same bugs. This is made worse by the fact that updating your Game Engine can be a huge pain,
[00:53.160 --> 00:58.840]  as anyone who's done a lot of game dev can probably tell you. And games don't usually
[00:58.840 --> 01:05.080]  get security patches after release. Maybe a bigger game will have some support, but an
[01:05.080 --> 01:11.340]  independent release is not going to get patches a year or two years after release just to fix some
[01:11.340 --> 01:18.580]  security bug. So there aren't a whole lot of good statistics on what Game Engines are the most
[01:18.580 --> 01:23.300]  popular, but there's a general understanding that two of them are more common than others.
[01:23.300 --> 01:30.040]  Those are Unreal Engine 4 and Unity. So as sort of a rule of thumb, if you're a solo developer
[01:30.040 --> 01:35.440]  or a small team, there's a pretty good chance you're using Unity. Whereas if you're a larger
[01:35.440 --> 01:40.960]  team but not so large that you've built your own engine from scratch, you're probably using Unreal
[01:40.960 --> 01:49.100]  Engine 4. So Unreal Engine 4 is created by Epic Games. It's named for its roots in the Unreal
[01:49.100 --> 01:54.800]  series. It is open source. There are some licensing restrictions on how you can use that source,
[01:54.800 --> 01:59.900]  but for our perspective of just looking for bugs, all the code is out there for us to look at.
[02:00.280 --> 02:03.740]  And there are really two big notable games, I think, right now
[02:03.740 --> 02:08.700]  that are built using Unreal Engine 4, and those are Fortnite and PUBG.
[02:10.560 --> 02:16.060]  Unity is built by Unity Technologies. There are some open source components of it,
[02:16.060 --> 02:21.840]  but the core components of Unity are closed source. The core networking library we're going
[02:21.840 --> 02:26.600]  to be looking at is called U-Net. And while I couldn't find a whole lot of big games that
[02:26.600 --> 02:31.760]  are using U-Net, there are countless amount of indie releases on Steam and everywhere else that
[02:31.760 --> 02:39.180]  are built using U-Net. Now I should say that U-Net is deprecated, but there are a few reasons that I
[02:39.180 --> 02:44.940]  thought it would be a good target anyway. The first is that U-Net has not got an official replacement
[02:44.940 --> 02:51.260]  yet. Unity hasn't put out an alternative yet. So if you're using Unity and you're doing multiplayer,
[02:51.260 --> 02:57.580]  you're either using U-Net still or you're using a third-party solution. Also, U-Net still does
[02:57.580 --> 03:03.560]  receive patches and occasionally new features, even though it's deprecated. So the encryption
[03:03.560 --> 03:11.380]  API was added to U-Net after deprecation. And so a ton of new, and more importantly I think,
[03:11.380 --> 03:17.780]  existing games use U-Net. So I think bugs in U-Net still have some value.
[03:18.400 --> 03:25.840]  So let's talk about multiplayer protocols. As game engines have evolved, and as multiplayer
[03:25.840 --> 03:31.280]  protocols have evolved, there's really been a focus on two things. The first is obvious, it's
[03:31.280 --> 03:37.240]  increasing performance, increasing speed. The other is moving trust away from the client
[03:37.840 --> 03:44.520]  in order to prevent hacking. But as we're going to see, these can sometimes be conflicting goals.
[03:45.020 --> 03:50.680]  But to really understand multiplayer protocols, I think it's worth understanding the types of
[03:50.680 --> 03:55.820]  attacks that they're aiming to prevent. And I think the best example of this is to talk about
[03:55.820 --> 04:02.820]  the evolution of what we'll call movement hacking. So movement hacking is the process of manipulating
[04:02.820 --> 04:08.480]  the player's location in some way that the game normally wouldn't allow. In the old days,
[04:08.480 --> 04:13.240]  this was really easy because player location was just trusted to the client. So if you
[04:13.240 --> 04:19.960]  manipulate your location client side, you can teleport server side. And this got more difficult
[04:19.960 --> 04:26.940]  as game engines got more complicated and trust was taken from the client and put on the server
[04:26.940 --> 04:34.500]  for player location. So the client can no longer say, I'm at XYZ. The client, all I can do is make
[04:34.640 --> 04:40.660]  a request saying, I would like to move, I'm moving in this direction at this speed, and the server
[04:40.660 --> 04:48.040]  will update their position accordingly. This led to a new type of attack called speed hacking. So
[04:48.040 --> 04:53.180]  it's sort of the next evolution of movement hacking. Speed hacking, the goal is not to
[04:53.180 --> 04:59.560]  teleport necessarily, but just to move extremely fast. And typically the way this works is you
[04:59.560 --> 05:05.920]  send that movement request excessively fast, faster than a normal client ever would,
[05:05.920 --> 05:13.360]  and more requests then means more speed. This was prevented by basically giving more authority and
[05:13.360 --> 05:19.980]  more context to the server. So the server should be able to understand what is a realistic distance
[05:19.980 --> 05:26.580]  that a character can move in a given time frame. And if a client is attempting to move
[05:26.580 --> 05:33.680]  beyond that time frame, it can stop that from happening. And so by giving this context to
[05:33.680 --> 05:39.200]  the server, we're able to prevent this type of attack. And this is really how game engines have
[05:39.200 --> 05:44.920]  evolved. It's been a constant process of moving trust away from the client and giving more
[05:44.920 --> 05:52.880]  authority, more responsibility, and more context to the server to understand what is normal.
[05:53.600 --> 05:59.680]  So with all of this said, let's talk about some of the technical specifics of multiplayer protocols.
[05:59.680 --> 06:04.500]  Now there's no real like published standard of what a multiplayer protocol looks like,
[06:04.500 --> 06:08.980]  but there are a few things that are pretty consistent between different protocols. And
[06:08.980 --> 06:15.160]  first of those is that most multiplayer protocols use some form of distributed architecture.
[06:15.840 --> 06:21.100]  And essentially what this means is each system, whether that be a client or a server, each system
[06:21.100 --> 06:27.440]  has a copy of every networked object, every object that's connected to the network in the game world.
[06:27.440 --> 06:32.800]  And actions between objects and between systems are performed through what are called remote
[06:32.800 --> 06:40.040]  procedure calls. So remote procedure calls, as the name sort of suggests, are a way of calling
[06:40.040 --> 06:46.020]  functions on a remote system as if you were calling them locally. So this is really easy for
[06:46.020 --> 06:51.500]  the programmer because it's just like you're calling a regular function of your code. The
[06:51.500 --> 06:56.780]  difference is that that function is executing on someone else's system, not your local system.
[06:56.800 --> 07:01.620]  This is really convenient, but there's a lot of complexity that goes into this process on the
[07:01.620 --> 07:08.200]  back end. And so with this concept of the distributed architecture usually comes some
[07:08.200 --> 07:14.220]  sort of concept of ownership, where owning an object usually means having the authority to
[07:14.220 --> 07:21.440]  issue RPCs on that object. And so the way this usually works is each player has ownership of
[07:21.440 --> 07:27.360]  their character and maybe some associated sub-objects like your inventory. But player A
[07:27.360 --> 07:34.760]  can only issue RPCs on character A. He can't issue RPCs on character B and vice versa.
[07:35.360 --> 07:41.160]  Another interesting technical detail of multiplayer protocols is that they're usually
[07:41.160 --> 07:47.580]  implemented over UDP. The one major exception to this are browser games where you can't access
[07:47.580 --> 07:52.080]  operating system sockets, so you have to use something like web sockets. But when we're
[07:52.080 --> 07:58.240]  about desktop games, we're usually talking about protocols that are implemented over UDP.
[07:58.240 --> 08:05.160]  And this puts extra requirements on the protocol itself because UDP won't do things like validate
[08:05.160 --> 08:10.340]  that a packet is part of a session that you've already authenticated or identify when a packet
[08:10.340 --> 08:15.920]  is either a duplicate or out of order. So the protocol gets more complex because it has to
[08:15.920 --> 08:21.640]  deal with these types of problems itself. So now that we've talked a little bit about how
[08:21.640 --> 08:26.480]  these protocols work, let's look at our first bug. And this is an Unreal Engine 4 bug, and it's
[08:26.480 --> 08:33.340]  actually a file pathing bug. So Unreal Engine uses its own type of URL, and it uses this to
[08:33.340 --> 08:38.280]  communicate details between the server and the client. So this can be stuff like the server can
[08:38.280 --> 08:44.940]  send a URL that says, we're playing on this map, you need to load these packages. Or the client
[08:44.940 --> 08:51.500]  could send something saying, I'm joining the game, my player name is Jack, and I'm joining with two
[08:51.500 --> 08:58.000]  other players who are playing split screen on the same computer. And so one of these URLs might look
[08:58.000 --> 09:03.300]  something like this. You start with the IP address of the server, then you've got a file path that
[09:03.300 --> 09:10.660]  corresponds to a particular asset, and then just like an HTTP URL, you've got key value pairs that
[09:10.660 --> 09:17.660]  are separated by an equal sign. So the bug here is pretty simple. If you use a malicious URL, you can
[09:17.660 --> 09:25.020]  cause a server or a client to access any local file path. It's not going to try to write to that
[09:25.020 --> 09:30.940]  file or even read from that file, it's just going to try to determine if that file exists. This is
[09:30.940 --> 09:35.260]  pretty boring on its own, but it gets more interesting when we start talking about Universal
[09:35.260 --> 09:42.660]  Naming Convention, or UNC paths. So UNC paths are special Windows paths that are used to access
[09:42.660 --> 09:50.540]  networked resources as if they were regular files. So regular local files. Typically a UNC
[09:50.540 --> 09:56.260]  path will look something like this. You've got two slashes, then the host name or IP address,
[09:56.260 --> 10:00.960]  another slash, then the share name, then another slash, and then either the file name or the full
[10:00.960 --> 10:09.420]  file path. So using this crafted URL, or one that looks like it, we can cause a server or client to
[10:09.420 --> 10:15.160]  connect to a remote SMB share. This is pretty simple. The one trick here is we do need to
[10:15.160 --> 10:22.440]  include the part that says .umap, where umap is a file extension used by Unreal Engine. And by
[10:22.440 --> 10:28.360]  having this somewhere in the domain name, here we use it as a subdomain, we can bypass some of the
[10:28.360 --> 10:37.440]  filtration. And if we provide this URL to a system, it will cause it to make an outgoing SMB
[10:37.440 --> 10:45.160]  connection and try to determine the existence of this hi.txt file. And so what this does is it
[10:45.160 --> 10:52.060]  opens up the affected system to the world of SMB related attacks. Typically when a Windows system
[10:52.060 --> 10:56.760]  connects to an SMB share, it'll try to authenticate with it. So this can be used for credential
[10:56.760 --> 11:02.120]  harvesting or authentication relaying. There's a whole world of different attacks using this.
[11:02.120 --> 11:07.940]  This can also be used pretty trivially as a server denial of service because the whole server will lock
[11:07.940 --> 11:13.780]  up as it's making this request. So if you cause that SMB connection to deliberately take a long
[11:13.780 --> 11:22.040]  time, you can lock up the server for a while. So this was fixed in Unreal Engine 4.25.2. With this
[11:22.040 --> 11:27.640]  commit, it's really easy to backport this. So if you're on an older version, this isn't too hard to
[11:27.640 --> 11:33.740]  apply. So this is a pretty fun bug, but let's talk about something a little bit flashier. And let's
[11:33.740 --> 11:41.600]  start talking about our first UNet bug. So UNet packets are packed in such a format that you can
[11:41.600 --> 11:48.480]  put multiple RPCs into a single packet. And that looks a little bit like this. So the first thing
[11:48.480 --> 11:53.940]  you've got is the packet header, which for our purposes, we don't really care about. But then
[11:53.940 --> 12:01.820]  you have each message within that packet. And each message consists of a 16-bit message length,
[12:02.000 --> 12:08.300]  a 16-bit value that I call the message type, and then the message body, the actual contents
[12:08.300 --> 12:13.500]  of that message. And then at the end of that body, you see you have the next message just concatenated
[12:13.500 --> 12:21.640]  on. The bug here, again, is pretty simple. If we supply a message size that's larger than the actual
[12:21.640 --> 12:27.080]  payload of our message, the actual body of our message, we can convince the server to act on
[12:27.080 --> 12:33.240]  extra data that's already in memory. So that looks a little something like this. If we imagine this
[12:33.240 --> 12:39.780]  is just a regular RPC, we start with the length, and then we've got the body, which is four bytes.
[12:39.780 --> 12:45.840]  But if we were to manipulate that length without actually increasing the length of our body,
[12:45.840 --> 12:50.720]  we'll see that all this other data that was already in memory, that's not part of our message,
[12:50.720 --> 12:57.660]  is now within scope of our RPC. So what's interesting about this is that this old memory
[12:57.660 --> 13:03.920]  actually comes from previous RPCs, not just from ours, but from other connections, from other
[13:03.920 --> 13:10.840]  players. So what we want to do here is we want to create an RPC that will leak this old memory to us
[13:10.840 --> 13:18.560]  kind of like Heartbleed would. And so to do this, it's not enough to just cause the server to
[13:18.560 --> 13:25.100]  act on that old memory. We need to convince the server to actually send that memory back to us.
[13:25.160 --> 13:31.940]  And so the type of RPC that is really good for this is chat messages. So let's look at what an
[13:31.940 --> 13:37.700]  example chat RPC might look like. In this case, again, we've got the length, and then we've got
[13:37.700 --> 13:43.620]  the body of the RPC, which is just made up of a string length and then a string body. But if we
[13:43.620 --> 13:50.060]  just apply that string length and we don't supply a body, then all the rest of that memory is going
[13:50.060 --> 13:57.140]  to be treated as part of that string. And if that RPC is accepted, the server is going to send back
[13:57.280 --> 14:02.100]  a message that says there's a new chat message and here's its content. And those contents will
[14:02.100 --> 14:10.320]  contain data from previous RPCs. But even in the absence of chat messages, there are some other
[14:10.320 --> 14:16.500]  RPCs we can use. Movement or spawning a new object are both good ones because they typically
[14:17.040 --> 14:25.340]  involve giving some form of vector, some location, as an argument. And we can use this to leak either
[14:25.340 --> 14:32.420]  8 or 12 bytes, depending on whether or not this vector is 2D or 3D, if it's a 3D game or a 2D game.
[14:32.420 --> 14:36.840]  And then there's always game-specific RPCs that we could potentially use for this.
[14:36.840 --> 14:42.960]  We just have to get creative. So let's talk about what we can leak with this. Now,
[14:43.480 --> 14:47.420]  there can be a lot of things. If you're doing authentication over U-Net, this could be
[14:47.420 --> 14:53.040]  passwords. It could also be private messages. It could be game-specific stuff like player locations
[14:53.040 --> 14:59.760]  or player actions. Really anything that gets sent over U-Net by the client could be leaked this way.
[14:59.800 --> 15:06.220]  And this was fixed pretty recently at the end of May with U-Net version 1.0.6.
[15:07.020 --> 15:11.640]  So I went kind of quickly through those first two, but I really wanted to save time for these
[15:11.640 --> 15:17.360]  last two bugs, which I think warrant going into a little bit more technical depth. This first one
[15:17.360 --> 15:24.040]  is another Unreal Engine bug. And I really like this bug because it is a universal speed hack for
[15:24.040 --> 15:31.280]  Unreal Engine. So we talked just briefly about this, but movement in Unreal Engine is what we
[15:31.280 --> 15:38.580]  call server authoritative. That means that the client cannot just say, I'm at XYZ. Instead,
[15:38.580 --> 15:46.040]  the client has to issue a movement RPC and ask the server to update the player's location.
[15:47.740 --> 15:53.660]  And so this movement RPC has two important parts to it. I am oversimplifying this quite a bit,
[15:53.660 --> 15:59.800]  but I only want to talk about the parts that are actually relevant to our bug. So the first
[15:59.800 --> 16:04.900]  argument is the movement vector. And this is a vector that says the direction that we're moving
[16:04.900 --> 16:10.100]  and the speed we're moving at. And the second argument is a timestamp that just says we sent
[16:10.100 --> 16:19.460]  this RPC at this time relevant to client time. And so the actual math here is that the server
[16:19.460 --> 16:25.780]  will calculate what's called a movement delta by taking our provided timestamp and subtracting
[16:25.780 --> 16:35.220]  from it the last valid timestamp that we provided to the server. And then it's going to
[16:35.220 --> 16:41.640]  take our movement vector and multiply it by our movement delta to calculate the actual movement
[16:41.640 --> 16:47.820]  that should be applied. And so if the server can properly validate that both the movement delta
[16:47.820 --> 16:53.350]  and the movement vector are sane, there's not a whole lot we can do to manipulate our movement.
[16:54.110 --> 16:58.070]  Now we need to make a slight digression and talk about floating point.
[16:58.530 --> 17:05.210]  So when I say floating point, I'm specifically referring to IEEE 754. And this is how most
[17:05.210 --> 17:10.310]  computer systems represent rational numbers, numbers that can be non-whole, such as 1, 2,
[17:10.310 --> 17:17.990]  3, 4, or 1, 2, dot, 3, 4, anything with a period in it. And so what's interesting about floating
[17:17.990 --> 17:23.530]  point is it has some special values. Floating point can be used to represent infinity, either
[17:23.530 --> 17:28.990]  positive or negative. It also has a special value called not a number, which I choose to verbalize
[17:28.990 --> 17:35.490]  as NAN. And for completion's sake, I should say that NAN can be positive or negative, but it
[17:35.490 --> 17:42.210]  doesn't typically matter. So these special values usually are the result of some undefined mathematical
[17:42.210 --> 17:49.890]  operation. So if you do any non-zero number divided by zero in floating point, you get
[17:49.890 --> 17:55.030]  infinite, either positive or negative. If you do zero divided by zero in floating point, you get
[17:55.030 --> 18:02.410]  NAN. Similarly, if you do square root of negative one, you get NAN. Now NAN is really the more
[18:02.410 --> 18:08.030]  interesting of these two special values, and it has a couple properties that are really unique.
[18:08.030 --> 18:14.050]  The first is that any affirmative comparison against NAN will evaluate to false. So we can
[18:14.050 --> 18:20.610]  compare NAN to zero in any way we want. It'll always be false. We can do NAN equals equals zero.
[18:20.610 --> 18:26.110]  False. Is NAN greater than zero? False. Is NAN less than zero? False. We can even compare NAN
[18:26.110 --> 18:33.350]  to itself, and it'll still be false. The other thing is that NAN has a tendency to propagate,
[18:33.350 --> 18:40.930]  by that I mean any mathematical operation where NAN is an operand will evaluate to NAN. So NAN
[18:40.930 --> 18:47.450]  plus one, minus one, times two, divided by two, these are all NAN. And no matter how complex you
[18:47.450 --> 18:53.950]  make this mathematical operation, if one operand is NAN, the entire thing will become NAN.
[18:54.890 --> 19:01.670]  So this brings up the term NAN poisoning, which is a condition where the unique properties of NAN
[19:01.670 --> 19:07.570]  cause some sort of intended effect. So let's look at the following code as a really simple
[19:07.570 --> 19:13.990]  example of this. So in this case, the programmer is trying to ensure that the floating point number
[19:13.990 --> 19:24.650]  NAN is between zero and 100. The problem here is that if NUM is NAN, both of these conditions
[19:24.650 --> 19:33.330]  will evaluate to false because any affirmative comparison with NAN will evaluate to false.
[19:33.330 --> 19:39.570]  So NAN will pass these validations and we will end up acting on NAN in some way,
[19:39.570 --> 19:45.930]  assuming that it is a legitimate regular floating point number within the range zero and 100.
[19:47.650 --> 19:52.990]  So NAN poisoning attacks are pretty rare because it's typically difficult to actually introduce
[19:52.990 --> 19:59.210]  NAN in the first place. You don't usually have an opportunity to divide by zero or do square root
[19:59.210 --> 20:03.970]  of a negative square root of a negative number or anything like that. These aren't common bugs,
[20:04.510 --> 20:11.610]  but when we're doing remote procedure calls, we can use any argument of the correct type.
[20:11.610 --> 20:16.890]  So if the RPC calls for a floating point number, we can give it any floating point number,
[20:16.890 --> 20:24.970]  including NAN or infinite. So going back to our movement RPC, there's only one argument that is a
[20:24.970 --> 20:30.550]  floating point number, and that's the timestamp. So what happens if our timestamp is NAN? Well,
[20:30.550 --> 20:36.270]  to figure this out, we have to look at this mouthful of a function,
[20:36.270 --> 20:41.430]  uCharacterMovementComponent, isClientTimeStamp valid?
[20:41.430 --> 20:44.350]  So this is a lot to look at, so we're going to go through it together. But let's assume that
[20:44.350 --> 20:51.530]  our timestamp is NAN, because that is the value we've provided to our RPC. So we get to this first
[20:51.530 --> 20:56.190]  check, which is intended to ensure that the timestamp is greater than zero. And because
[20:56.190 --> 21:01.430]  this is an affirmative comparison, this will evaluate to false, and we just skip right past
[21:01.430 --> 21:08.090]  to the next line. Then we're going to calculate our delta timestamp, and this is done by subtracting
[21:08.090 --> 21:14.930]  our provided timestamp with the last valid timestamp that was received from our connection.
[21:14.970 --> 21:20.690]  But in this case, it does not matter what that last valid timestamp was, because our
[21:20.690 --> 21:28.290]  provided timestamp is NAN, and NAN minus anything will always be NAN. So delta timestamp evaluates
[21:28.290 --> 21:34.430]  to NAN. Then we get to these next two checks. In the first case, again, we're going to pass
[21:34.430 --> 21:41.190]  right by because timestamp is NAN, and this is an affirmative comparison. And for the next check,
[21:41.190 --> 21:46.450]  delta timestamp is also NAN. This is another affirmative comparison. We pass right by,
[21:46.450 --> 21:53.210]  and our timestamp is considered to be quote-unquote valid. So this is all written in
[21:53.210 --> 21:59.310]  such a way that just by pure luck, NAN will just pass right through and be considered valid. And so
[21:59.310 --> 22:06.230]  the next thing we do is we generate our delta time using NAN. So our delta time is calculated,
[22:06.230 --> 22:13.930]  again, by subtracting our last valid timestamp from our provided timestamp.
[22:13.930 --> 22:21.170]  And because our provided timestamp is NAN, delta time will evaluate to NAN regardless of what our
[22:21.170 --> 22:28.490]  last valid timestamp was. Now the server will use our delta time to attempt to apply our movement,
[22:28.490 --> 22:33.630]  but this is where we run into our first issue. There's one last sanity check here to ensure that
[22:33.630 --> 22:39.930]  the delta time is greater than zero, and we will never pass this check when delta time is NAN
[22:39.930 --> 22:45.330]  because it's another affirmative comparison. So our movement is not applied, and even though
[22:45.330 --> 22:51.770]  our timestamp was considered valid, we don't go anywhere. We don't move. But we're not quite
[22:51.770 --> 22:57.390]  done yet because we've caused another value to be poisoned. We've caused server data current client
[22:57.390 --> 23:04.130]  timestamp to become NAN, and this is the saved version of the last valid timestamp. Because our
[23:04.130 --> 23:10.630]  timestamp was considered valid, even though we didn't apply our movement, NAN was shifted into
[23:10.630 --> 23:16.490]  that current client timestamp variable. Now we need to look at that is client timestamp valid
[23:16.490 --> 23:21.750]  function again. So we're looking at the same function again, but this time we're going to
[23:21.750 --> 23:27.950]  assume that our given timestamp is not NAN. It's not any special number. It's just a regular floating
[23:27.950 --> 23:34.070]  point number greater than zero. So we bypass this first check. We get to the delta timestamp
[23:34.070 --> 23:40.170]  calculation. And again, we're going to do our given timestamp minus our last valid timestamp,
[23:40.170 --> 23:47.750]  but this time our last valid timestamp is NAN. So delta time is still going to calculate as NAN.
[23:47.750 --> 23:53.450]  Then we get to these two comparisons, which just as before, we're going to bypass because
[23:53.450 --> 24:00.310]  server data current client timestamp and delta time are both NAN. So we pass these checks. And
[24:00.310 --> 24:07.430]  again, our timestamp is considered valid. So on this second RPC call, any timestamp greater than
[24:07.430 --> 24:12.330]  zero will pass the validity check. We could say it's 30 years in the future. It doesn't matter
[24:12.330 --> 24:19.350]  as long as it is greater than zero. And our first RPC call use NAN as a timestamp. Our second call
[24:19.350 --> 24:25.350]  will always pass the validation check. Unfortunately, our delta time is still going to
[24:25.350 --> 24:31.830]  calculate as NAN because our old client timestamp was NAN. So still nothing happens. We haven't
[24:31.830 --> 24:38.690]  moved an inch, but fortunately we've poisoned one more value now. So while all this is going on,
[24:38.690 --> 24:44.650]  the server is trying to determine if our time has drifted from server time. Essentially the
[24:44.650 --> 24:50.290]  server wants to make sure that we're not doing anything tricky or that we're not,
[24:50.290 --> 24:56.750]  our clock isn't drifting. And it does this by independently calculating its own delta time
[24:56.750 --> 25:03.590]  and calculating an error rate, a difference between our delta time and the server's delta time.
[25:03.590 --> 25:10.670]  But because our delta time is NAN, our client error is also going to be NAN. And then this
[25:10.670 --> 25:18.010]  client error is used to build up our cumulative value new time discrepancy. And this value is
[25:18.010 --> 25:24.290]  used to detect when we have drifted too far from server time. This is used to detect speed hacking
[25:24.290 --> 25:31.970]  attempts. It's also used to detect when a client might just be lagging too much. But because our
[25:31.970 --> 25:39.270]  client error is NAN, when our client error is added, our new time discrepancy value also becomes
[25:39.270 --> 25:47.270]  NAN. And because new time discrepancy is what's used to determine a difference between client
[25:47.270 --> 25:53.650]  time and server time, once we've poisoned this value, we can essentially disable the server's
[25:53.650 --> 25:59.650]  ability to detect the time discrepancy for our connection. And so when we actually get to the
[25:59.650 --> 26:06.910]  point when the server would attempt to detect a time discrepancy between our time and server time,
[26:06.910 --> 26:13.790]  this check will never pass because new time discrepancy is NAN. And more importantly, no
[26:13.790 --> 26:19.550]  mathematical operation will ever cause new time discrepancy to become a regular floating point
[26:19.550 --> 26:26.710]  number again. It is stuck as NAN. So at this point, because we've neutered the server's ability
[26:26.710 --> 26:33.130]  to detect a time discrepancy, we can now pull off like an old school speed hack where we just
[26:33.130 --> 26:39.750]  speed up time in order to move faster than we should. And what this allows us to do is it allows
[26:39.750 --> 26:46.570]  us to move significantly faster than our built-in limitations would ever allow. So I've oversimplified
[26:46.570 --> 26:52.190]  this process a bit. I know it's still pretty complicated, but what I've come up with as not
[26:52.190 --> 26:58.710]  the most efficient but the most straightforward way of exploiting this is to just send RPCs in
[26:58.710 --> 27:05.450]  groups of three, where the first RPC, the timestamp is NAN. The second, the timestamp is just slightly
[27:05.450 --> 27:12.190]  greater than zero. And the third, the timestamp is some value well in the future. And every time you
[27:12.190 --> 27:18.130]  send this grouping, it will move you forward some amount. And because the server cannot detect the
[27:18.130 --> 27:23.130]  discrepancy anymore, you can do this as often as you want. And the only real limitation is how
[27:23.130 --> 27:30.570]  quickly you can send those RPCs and have them be processed. So saying all that, let's actually look
[27:30.570 --> 27:35.350]  at this in action. And I think you can probably imagine what this is going to look like, but I
[27:35.350 --> 27:44.290]  worked hard on it, so humor me. This is our first demo. This is built on a stock Unity, or I'm
[27:44.290 --> 27:50.670]  stock Unreal Engine 4 game template. All I've done with it is I've enabled all the speed
[27:50.670 --> 27:55.510]  hacking protections, I've opened up the game world just a bit so we can run around some more,
[27:55.510 --> 28:02.170]  and I've modified the client to actually pull off our attack. And this is filmed from the server
[28:02.170 --> 28:07.530]  perspective, just so that you can see that this is actually happening server-side. It doesn't just
[28:07.530 --> 28:13.350]  look like we're moving fast client-side. So in the background, we've got our hacker. As you can see,
[28:13.350 --> 28:18.510]  he's vibrating. He's very excited to be here. And what we're going to do is we're just going to get
[28:18.510 --> 28:23.490]  out of the way so we can get a good view of him running. And in just a second, we are going to
[28:23.490 --> 28:32.070]  see him blast off. And he's gone. Okay, so that is a little bit faster movement than intended,
[28:32.070 --> 28:38.170]  even with the speed hacking protections on. I really like this bug. I love floating point bugs,
[28:38.170 --> 28:42.790]  and the great thing about this one is that it actually does something other than just being
[28:43.050 --> 28:48.570]  a denial of service. I also think that this type of attack can apply in other ways. I've looked at
[28:48.570 --> 28:55.370]  other games that use floating point in its RPCs, and you can usually get some sort of unintended
[28:56.130 --> 29:01.290]  behavior by using these special floating point numbers. It's not always useful. It doesn't always
[29:01.290 --> 29:07.110]  do anything for us for an attacker, but it usually does something. I should also say that this does
[29:07.110 --> 29:16.330]  still apply to Unity, but with UNet, it doesn't limit your movement in the first place, so you
[29:16.330 --> 29:23.250]  don't have to go through this complicated process to speed hack. So with that out of the way,
[29:23.250 --> 29:28.810]  let's talk about our final bug. We're going to go back to UNet, and this is a session hijacking bug.
[29:29.530 --> 29:36.270]  So UNet uses a protocol level process to authenticate incoming packets, because UNet
[29:36.270 --> 29:44.470]  is implemented over UDP, so you don't get the benefits of TCP where a stream is already
[29:44.470 --> 29:50.310]  authenticated, and you can assume that a packet is part of a stream that you've already seen before.
[29:50.310 --> 29:55.070]  So packets are not validated by their source IP address, or their source port, or anything like
[29:55.070 --> 30:01.610]  that. They're only validated by values within the packet itself. So knowing this, it is at least
[30:01.610 --> 30:07.210]  theoretically possible that someone could hijack another player's session totally remotely, totally
[30:07.210 --> 30:11.750]  over the internet. We don't need a man in the middle, we don't need to be over LAN, or anything
[30:11.750 --> 30:19.790]  like that. And that's the plan here. So when UNet validates an incoming packet, there are three
[30:19.790 --> 30:25.510]  important values it looks at. The first is the host ID, then the session ID, and the packet ID.
[30:25.510 --> 30:29.570]  And these names don't mean anything on their own, so let's look at each one in detail.
[30:30.670 --> 30:36.690]  So the first of these values is the host ID, and this is a 16-bit integer that's used to associate
[30:37.150 --> 30:43.570]  a packet with a given client. Host IDs are assigned sequentially, starting at 1. So the first client
[30:43.570 --> 30:50.610]  to connect gets host ID 1, the second client gets host ID 2, etc. Now host IDs aren't really intended
[30:50.610 --> 30:56.030]  to be a secret, so in a way we can sort of just ignore them. It's also really easy to enumerate
[30:56.030 --> 31:01.570]  the host ID of another player. If we're the second player in a game and we get host ID 2, the other
[31:01.570 --> 31:08.850]  player is probably host ID 1. If we're in a battle royale and we're host ID 47, the other host IDs
[31:08.850 --> 31:15.730]  are probably 1 through 46 and 48 through 100. Things get a little more complicated when we talk
[31:15.730 --> 31:21.970]  about the next value, which is the session ID. The session ID is the primary authenticating secret
[31:21.970 --> 31:27.830]  of a connection, and the session ID is randomly generated by the client when they connect.
[31:27.830 --> 31:32.650]  And every packet received for that client has to have the correct session ID, or the packet
[31:32.650 --> 31:38.250]  will be discarded. There are a couple problems with the session ID though. The first is that
[31:38.250 --> 31:44.610]  the session ID is also a 16-bit integer. It also can't be zero. This means that there's only
[31:44.610 --> 31:53.150]  65,535 possible session IDs. There's also no penalty for incorrectly guessing a session ID,
[31:53.150 --> 31:58.030]  other than the fact that our packet will just be discarded by the server. So we can easily brute
[31:58.030 --> 32:03.930]  force the whole range of session IDs, even over the open internet. It doesn't matter. It doesn't
[32:03.930 --> 32:10.070]  take long at all. Now we don't really need to do this part, but there is one more thing we can do
[32:10.070 --> 32:16.310]  to narrow down that search even more. So session IDs are generated with a function named
[32:16.310 --> 32:22.670]  unet getRandNotZero. And what this function does is exactly what the name says. It gets a random
[32:22.670 --> 32:28.210]  number and ensures that that random number is not zero. And this is used for session IDs because
[32:28.210 --> 32:36.810]  session IDs cannot be zero. But so the way this function actually works is it takes the end result,
[32:36.810 --> 32:42.170]  that random number, and it ORs it with one. It essentially ensures that the least significant bit
[32:42.170 --> 32:48.470]  of that output will always be one. This has the effect of ensuring that a legitimate unet client
[32:48.470 --> 32:54.990]  will only ever generate an odd-numbered session ID. Technically, a session ID can be any 16-bit
[32:54.990 --> 33:00.730]  value other than zero, but a legitimate client is programmed to never actually generate one unless
[33:00.730 --> 33:08.590]  it's odd. So this reduces the possible session IDs down to 32,768, so 50%.
[33:10.010 --> 33:15.710]  So if we know the host ID and we can guess the session ID, that's all we need for our spoof
[33:15.710 --> 33:21.310]  packet to be accepted within the context of another player's session. But there is one more hiccup,
[33:21.310 --> 33:26.370]  that's the packet ID. The packet ID is an integer that's incremented with each packet
[33:26.370 --> 33:31.530]  sent by the client, basically a sequence number. And again, it's 16 bits long.
[33:32.450 --> 33:38.150]  So what the packet ID is for is it's used to detect duplicate or out-of-order packets because,
[33:38.150 --> 33:44.670]  again, UDP isn't doing this for us. It's also used to determine the rate of packet loss.
[33:44.670 --> 33:50.570]  So if the last packet ID received by the server is one and the next packet ID it gets is 1000,
[33:50.570 --> 33:58.830]  the server is going to assume that it's lost 998 packets in the meantime.
[34:00.970 --> 34:07.910]  So if we can determine the host ID and guess the session ID, what can we do with the packet ID?
[34:07.910 --> 34:12.330]  I guess a better question might be, what happens if we just send a random packet ID?
[34:12.330 --> 34:19.550]  So let's read Unity's documentation on exactly this. So according to the documentation,
[34:19.550 --> 34:26.050]  there are a few conditions. If the new packet ID is greater than the last packet ID plus 512,
[34:26.050 --> 34:30.310]  we're going to disconnect the session because we've lost too many packets.
[34:30.390 --> 34:37.230]  If the packet ID is more than 512 behind the current packet ID, we're going to discard it
[34:37.230 --> 34:43.930]  because it's too old. We don't want it anymore. If the packet ID is in a list of packets we've
[34:43.930 --> 34:49.710]  seen recently, it's a duplicate, let's discard it. Otherwise, if none of these conditions are met,
[34:49.710 --> 34:55.090]  we're going to accept and process that packet. So there are a couple interesting things about
[34:55.090 --> 35:01.810]  this. The first is that if our guest packet ID is greater than our last packet ID plus 512,
[35:01.810 --> 35:06.410]  the connection will be disconnected. I want to emphasize that this is not our connection we're
[35:06.410 --> 35:11.030]  talking about. This is the other player's connection. So this is useful because it
[35:11.030 --> 35:17.630]  means we can pretty trivially kick other players off the server. However, it would be a lot more
[35:17.630 --> 35:24.590]  interesting if we could bypass this check and inject a packet that would be actually executed
[35:24.590 --> 35:30.610]  within another player's session. But reading the documentation, it seems like the odds of this
[35:30.610 --> 35:37.710]  happening are pretty low. Guest packet ID must be last packet ID plus or minus 512, which doing
[35:37.710 --> 35:44.550]  the math gives us less than a 7% chance of success. That's pretty bad. But when we look at the actual
[35:44.550 --> 35:50.990]  implementation, it tells a slightly different story. So packet ID validation is done by the
[35:50.990 --> 35:57.650]  function unet replay protector is packet replayed. In practice, this function actually does not
[35:57.650 --> 36:03.730]  discard packets that are more than 512 packets old. Like the documentation said, I spent a lot
[36:03.730 --> 36:08.350]  of time thinking that I was just misunderstanding or that it was more complicated than I thought.
[36:08.350 --> 36:14.750]  But no, the logic for discarding old packets just isn't there. Instead, old packets are accepted as
[36:14.750 --> 36:21.270]  if they weren't more than 512 packets old. So unfortunately, it's a little more complicated
[36:21.270 --> 36:27.870]  than it sounds. We can't just use packet ID zero every time, even though that would be the lowest
[36:27.870 --> 36:34.150]  packet ID. And the reason for that is that the server has to account for cases where the packet
[36:34.150 --> 36:40.510]  ID overflows, goes from FFFF to zero. And this happens pretty often over the course of a game.
[36:40.510 --> 36:46.410]  So instead of just directly comparing the numeric value, the server has to keep
[36:46.410 --> 36:53.850]  like a rolling window of packet IDs to determine if a packet is old or new. And so doing the math
[36:53.850 --> 37:01.250]  here, we have what's very close to a 50-50 shot that a packet ID will be accepted. Most of the
[37:01.250 --> 37:06.010]  rest of the time, our packet is going to cause the other player to get kicked out of the game,
[37:06.010 --> 37:11.470]  which is still pretty useful. And I say most of the rest of the time, because occasionally
[37:11.930 --> 37:16.910]  we'll guess a packet ID that was actually seen recently, and it'll be seen as a duplicate and
[37:16.910 --> 37:22.930]  just be discarded. Okay, so let's look at our second demo. And for this, I had to use an actual
[37:22.930 --> 37:28.790]  game, not just something I came up with myself. And this game is called Streets of Rogue. It's
[37:28.790 --> 37:35.750]  very cool. You should buy it. And I do want to emphasize that this is not a bug in the game
[37:35.750 --> 37:40.630]  itself. There's nothing wrong with how this game is programmed. This is a bug in U-Net. But even
[37:40.630 --> 37:47.490]  still, what we're going to do with this bug is we're going to inject packets to cause
[37:47.490 --> 37:53.830]  two other players to do actions that we tell them to. So this little guy in the red shirt is me,
[37:53.830 --> 37:59.450]  and these two identical looking people are our target players. And what we're going to do is
[37:59.450 --> 38:03.250]  we're going to inject packets. We're going to see them do a few things. First, they're going to say
[38:03.250 --> 38:09.210]  hi to everyone. Then we're going to kill them both. Then I started to feel a little bit bad,
[38:09.210 --> 38:12.310]  so we're going to resurrect them both. And then we're going to kill them again.
[38:14.030 --> 38:20.490]  Okay, so this is a pretty simple demo. Kind of stupid, but it demonstrates a few things.
[38:20.490 --> 38:26.350]  The first is that we're able to inject packets that are actually accepted within the context
[38:26.350 --> 38:31.270]  of another player's session. The second is that we're able to do it consistently enough that
[38:31.270 --> 38:35.710]  we're able to do it for two players at the exact same time, even though those two players have
[38:35.710 --> 38:41.330]  different host IDs, different session IDs, different packet IDs, everything. And finally,
[38:41.330 --> 38:46.210]  we're able to do it consistently enough that we're able to inject multiple packets into each
[38:46.210 --> 38:52.070]  player's session. We were able to inject a packet to say something, pause, inject another packet to
[38:52.610 --> 38:58.050]  cause them to explode, pause, and then do this four times, injecting four different packets.
[38:58.050 --> 39:06.430]  So this bug is pretty reliable. And as I said, even if you lose the coin toss, all that happens
[39:06.430 --> 39:11.130]  is you kick another player out of the game, which has its utilities on its own.
[39:11.830 --> 39:17.450]  So let's talk about remediations for this bug. This is considered to be an architectural weakness
[39:17.450 --> 39:24.590]  with U-Net. The actual fixes that would be required to prevent this entirely are not going
[39:24.590 --> 39:31.490]  to happen. The only mitigation, aside from moving away from U-Net, is to actually encrypt U-Net.
[39:32.070 --> 39:37.790]  Unity does provide a reference implementation that does a decent job of this, but it's not
[39:37.790 --> 39:44.350]  complete. It doesn't do key exchange for you. So to an extent, you are still on your own with this.
[39:44.410 --> 39:49.970]  And I've looked a lot, but I've not found a single game implementing encryption over U-Net.
[39:49.990 --> 39:56.910]  And I should emphasize that this is a mitigation. It is not a fix. So if your encryption is not
[39:57.570 --> 40:05.410]  complete, if an attacker can bypass this encryption, these bugs still exist exactly
[40:05.410 --> 40:12.150]  the same as they exist now. Okay, so that was my final bug. So let's talk about some future work.
[40:12.150 --> 40:18.190]  I don't think I've found all the bugs, even in the components that I have looked at,
[40:18.190 --> 40:22.730]  but there are a few components that I haven't looked at at all that I think are worth looking
[40:22.730 --> 40:27.330]  at. So both of these protocols have other transport modes. I think the most interesting
[40:27.330 --> 40:31.210]  of these are WebSockets, because that's what browser games are going to be using.
[40:32.190 --> 40:37.930]  There's also third-party networking plugins for some of these engines. Photon and Mirror are two
[40:37.930 --> 40:43.950]  relatively common ones for Unity. And there's just other engines, GameMaker Studio, Godot,
[40:43.950 --> 40:52.170]  stuff like that. Finally, I want to thank a few people, both Epic Games and Unity Technologies.
[40:52.170 --> 40:58.170]  Their security teams were absolutely fantastic, super communicative, kept me up to date the entire
[40:58.170 --> 41:05.390]  time, put up with a lot of me, which can be very annoying. I have nothing but good things to say
[41:05.390 --> 41:10.410]  about both of them. I also want to thank the artist who made the background art for my
[41:10.410 --> 41:17.610]  presentation, Grigorin. He's awesome. And finally, all the scripts, everything I've put together
[41:17.610 --> 41:24.130]  for this work, I've got up on GitHub. I've got POCs for most of these issues. I've got
[41:24.130 --> 41:32.690]  libraries for interfacing with UNet and Unreal's protocol. And I hope that if I can ask anything
[41:32.690 --> 41:37.050]  of you, it's that you go, you use something that you've learned here to go get banned from
[41:37.050 --> 41:40.630]  some video game. All right. Thank you, everyone.
